-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue 7/implicit functions #8
base: issue-7/implicit_functions
Are you sure you want to change the base?
Issue 7/implicit functions #8
Conversation
Thanks for the PR! "Algebraic equations" is how these are typically described in the literature. I'd like to continue to at least connect to the mainstream literature. Part of the motivation for writing this volume is to provide an accessible presentation. So I'd like to keep the presentations to the simplest possible level rather than the most general mathematical level. I'm just bringing this up as a criterion to consider this PR. I'll let @charlesm93 respond directly on the content. |
This reverts commit 2c459c3.
Hi @IvanYashchuk,
What I do in this section might make more sense once I've written the other sections on implicit functions. |
It is a common term, even used in the current draft Line 103 in 09c8556
There are many different PDEs and there are many different objective (target?) functions. AD rules are independent of the form of the objective functions. Stationary PDEs are nonlinear equations of form F(x) = 0, linear stationary PDEs can be written in the same form, time-dependent PDEs are often reduced to a sequence of stationary PDEs. Alright, I agree that the Lagrangian approach is needed for ODEs, etc. Anyways the actual rules for AD are more important than the derivation of them. |
Point taken. Adding a summary at the end, as you have done, with only the results plainly stated works well. These sections can be named "tangent linear method" and "adjoint method" (again, as you have done). The first sections, in which the differentiation algorithms are derived, can be named "Implicit function theorem" and "Lagrangian approach". The implicit function theorem allows you to derive the tangent linear and adjoint method.
Some properties of the objective function matter. A scalar objective motivates an adjoint method, while a long vector may warrant a tangent method.
Isn't an equation of the form F(x) = 0 simply an algebraic equation? To have a PDE, F must also depend on partial derivatives of x. Unless of course you can point me to an example where this is not the case.
It depends on the reader. For Stan developers, being able to extend results to other cases can sometimes be important, as the AD handbook may not be exhaustive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is very good. Please address the requested changes.
I'm still drafting this section (and the other chapters on implicit functions). In the future, please wait for a chapter to be finished and merged to the master branch before proposing a PR. You can always bring up issues. All that said, thank you for your help!
with one additional matrix multiplication operation | ||
\begin{equation*} | ||
\frac{\mathrm d j}{\mathrm d \psi}^{*} = - \frac{\partial f}{\partial \psi}^{*} \lambda | ||
+ \frac{\partial j}{\partial \psi}^{*}. | ||
\end{equation*} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pain to read without the pdf output. Can you summarize the change you made here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are derivations with rendered latex, though the symbols I used there are different
https://colab.research.google.com/drive/1zA75xSfsy2d7-7ojWoUbmCqS6CZy30m3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing I'm doing a code review, linking to similar code that does the calculations with different symbols is not good enough. The code you have doesn't compile, and I had to make a few corrections to render it. You use * for transpose, but this is inconsistent with previous notation where T is used. You also do not use []^{-1} as I have requested for other sections of your code.
Last but not least, it is unclear to me why you have replaced the original calculations.
I think I've addressed all the needed changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is still a bit of work to be done and you need to test that your code compiles.
(I realize that I probably misinterpreted your friendly suggestions as a formal PR, and I could've taken some of your suggestions without doing code review. But since we're almost done, we may as well make a merge.)
with one additional matrix multiplication operation | ||
\begin{equation*} | ||
\frac{\mathrm d j}{\mathrm d \psi}^{*} = - \frac{\partial f}{\partial \psi}^{*} \lambda | ||
+ \frac{\partial j}{\partial \psi}^{*}. | ||
\end{equation*} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing I'm doing a code review, linking to similar code that does the calculations with different symbols is not good enough. The code you have doesn't compile, and I had to make a few corrections to render it. You use * for transpose, but this is inconsistent with previous notation where T is used. You also do not use []^{-1} as I have requested for other sections of your code.
Last but not least, it is unclear to me why you have replaced the original calculations.
+ \frac{\partial f}{\partial u}\frac{\mathrm d u}{\mathrm d \psi} = 0 \\ | ||
\iff & \frac{\partial f}{\partial u}\frac{\mathrm d u}{\mathrm d \psi} | ||
= - \frac{\partial f}{\partial \psi} \\ | ||
\iff & \frac{\mathrm d u}{\mathrm d \psi} = - \left [\frac{\partial f}{\partial u} \right]^{-1} | ||
\frac{\partial f}{\partial \psi} | ||
\end{align*} | ||
where we assume the requisite differentiation and inversion are possible. | ||
The linear system that is solved for $\frac{\mathrm d u}{\mathrm d \psi}$ is called the _tangent linear system_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be true but I don't see the point of introducing this notion here.
Hello @charlesm93, here are my friendly suggestions on the section you wrote.
I propose renaming "algebraic equations" -> "nonlinear system" as the same results hold for this more general case (including also PDEs).
Implicit function theorem just states that du/dpsi exists under certain conditions, therefore, I renamed the section to "tangent linear method".
I think that there is no need to introduce the Lagrangian approach for the adjoint method, as it is naturally derived from the tangent method. I propose to remove that section.
In "practical considerations" I have added the expressions needed for the implementation of forward and reverse AD of nonlinear solvers.
Ref. #7